GRU on top of ELMo embedding layer #12

TharinduDR · 2019-02-11T19:48:37Z

Hi,
I replaced my embedding layer with the ELMo embedding layer. The code looks like this -

` embedding_layer = ElmoEmbeddingLayer()

# Embedded version of the inputs
encoded_left = embedding_layer(left_input)
encoded_right = embedding_layer(right_input)

# Since this is a siamese network, both sides share the same GRU
shared_gru = GRU(n_hidden, name='gru')

left_output = shared_gru(encoded_left)
right_output = shared_gru(encoded_right)`

But I am running to error - Input 0 is incompatible with layer gru: expected ndim=3, found ndim=2. The architecture worked well with the default embedding layer. Any idea what am I doing wrong?

The text was updated successfully, but these errors were encountered:

hambro · 2019-03-07T16:22:22Z

I think it is beacuse the ElmoEmbeddingLayer uses the default-argmuent of Elmo (see the def call() function), resulting in a fixed mean-pooling of all contextualized word representations with shape [batch_size, 1024].
This is documented in the Output section of the TensorFlow model.

signature='default' means the input is whole sentences, and that the model should perform the splitting. The interesting part is the ['default']-dict lookup, which should be changed to ['elmo'].
This should return a tensor with the shape [batch_size, max_length, 1024].
I've also changed compute_output_shape to return (None, None, 1024), since we don't know how long each sequence is.
However, I still get some errors. Here is my updated and not working layer:

class ElmoEmbeddingLayer(Layer):
    def __init__(self, **kwargs):
        self.dimensions = 1024
        self.trainable=True
        super(ElmoEmbeddingLayer, self).__init__(**kwargs)

    def build(self, input_shape):
        self.elmo = hub.Module('https://tfhub.dev/google/elmo/2', trainable=self.trainable,
                               name="{}_module".format(self.name))

        self.trainable_weights += K.tf.trainable_variables(scope="^{}_module/.*".format(self.name))
        super(ElmoEmbeddingLayer, self).build(input_shape)

    def call(self, x, mask=None):
        result = self.elmo(
            K.squeeze(
                K.cast(x, tf.string), axis=1
            ),
            as_dict=True,
            signature='default',
            )['elmo']
        return result

    def compute_mask(self, inputs, mask=None):
        return K.not_equal(inputs, '--PAD--')

    def compute_output_shape(self, input_shape):
        return (input_shape[0], None, self.dimensions)

jacobzweig · 2019-05-07T18:26:28Z

Thanks @hambro – I had sketched out a working prototype a while back – I'll see if I can dig something up and get it up here when I get a bit of free time.

yjiang18 · 2019-09-06T12:52:14Z

I modified the elmo embedding layer as follow, and the output_shape is [batch_size, seq_len, 1024] now, and you can put LSTM/GRU on top of it:

class ElmoEmbeddingLayer(Layer):

def __init__(self, mask, **kwargs):
    self.dimensions = 1024
    self.trainable = True
    self.mask = mask
    super(ElmoEmbeddingLayer, self).__init__(**kwargs)

def build(self, input_shape):
        self.elmo = hub.Module('https://tfhub.dev/google/elmo/2', trainable=self.trainable,
                               name="{}_module".format(self.name))
        self.trainable_weights += K.tf.trainable_variables(scope="^{}_module/.*".format(self.name))
        super(ElmoEmbeddingLayer, self).build(input_shape)

def call(self, inputs, mask=None):
        # inputs.shape = [batch_size, seq_len]
        seq_len = [inputs.shape[1]] * inputs.shape[0] # this will give a list of seq_len: [seq_len, seq_len, ..., seq_len] just like the official example.
        result = self.elmo(inputs={"tokens": K.cast(inputs, dtype=tf.string),
                                   "sequence_len": seq_len},
                      as_dict=True,
                      signature='tokens',
                      )['elmo']
     
        return result

def compute_mask(self, inputs, mask=None):
        if not self.mask:
            return None

        output_mask = K.not_equal(inputs, '--PAD--')
        return output_mask
def compute_output_shape(self, input_shape):
        return (input_shape[0], input_shape[1], self.dimensions)`

The only problem is you have to explicitly give batch size to the Input Layer, say,
you might have input layer like this text_input =Input(shape=(sequence_length,), dtype=tf.string), now you have to modify it to:
text_input =Input(batch_shape=(batch_size, sequence_length), dtype=tf.string).

This might cause the last batch might be smaller than the batch_size, and rise an error when the model reaches the end of epoch.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GRU on top of ELMo embedding layer #12

GRU on top of ELMo embedding layer #12

TharinduDR commented Feb 11, 2019 •

edited

hambro commented Mar 7, 2019 •

edited

jacobzweig commented May 7, 2019

yjiang18 commented Sep 6, 2019

GRU on top of ELMo embedding layer #12

GRU on top of ELMo embedding layer #12

Comments

TharinduDR commented Feb 11, 2019 • edited

hambro commented Mar 7, 2019 • edited

jacobzweig commented May 7, 2019

yjiang18 commented Sep 6, 2019

TharinduDR commented Feb 11, 2019 •

edited

hambro commented Mar 7, 2019 •

edited